dataset and method
CF-RAG: A Dataset and Method for Carbon Footprint QA Using Retrieval-Augmented Generation
Zhao, Kaiwen, Balaji, Bharathan, Lee, Stephen
Product sustainability reports provide valuable insights into the environmental impacts of a product and are often distributed in PDF format. These reports often include a combination of tables and text, which complicates their analysis. The lack of standardization and the variability in reporting formats further exacerbate the difficulty of extracting and interpreting relevant information from large volumes of documents. In this paper, we tackle the challenge of answering questions related to carbon footprints within sustainability reports available in PDF format. Unlike previous approaches, our focus is on addressing the difficulties posed by the unstructured and inconsistent nature of text extracted from PDF parsing. To facilitate this analysis, we introduce CarbonPDF-QA, an open-source dataset containing question-answer pairs for 1735 product report documents, along with human-annotated answers. Our analysis shows that GPT-4o struggles to answer questions with data inconsistencies. To address this limitation, we propose CarbonPDF, an LLM-based technique specifically designed to answer carbon footprint questions on such datasets. We develop CarbonPDF by fine-tuning Llama 3 with our training data. Our results show that our technique outperforms current state-of-the-art techniques, including question-answering (QA) systems finetuned on table and text data.
- North America > United States > District of Columbia > Washington (0.05)
- South America > Colombia > Antioquia Department > Medellín (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Public Relations > Community Relations (0.55)
- Research Report > New Finding (0.54)
- Social Sector (0.55)
- Law (0.34)
Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question
In this paper, we present the mQA model, which is able to answer questions about the content of an image. The answer can be a sentence, a phrase or a single word. Our model contains four components: a Long Short-Term Memory (LSTM) to extract the question representation, a Convolutional Neural Network (CNN) to extract the visual representation, an LSTM for storing the linguistic context in an answer, and a fusing component to combine the information from the first three components and generate the answer. We construct a Freestyle Multilingual Image Question Answering (FM-IQA) dataset to train and evaluate our mQA model. It contains over 150,000 images and 310,000 freestyle Chinese question-answer pairs and their English translations.
WAS: Dataset and Methods for Artistic Text Segmentation
Xie, Xudong, Li, Yuzhe, Liu, Yang, Zhang, Zhifei, Wang, Zhaowen, Xiong, Wei, Bai, Xiang
Accurate text segmentation results are crucial for text-related generative tasks, such as text image generation, text editing, text removal, and text style transfer. Recently, some scene text segmentation methods have made significant progress in segmenting regular text. However, these methods perform poorly in scenarios containing artistic text. Therefore, this paper focuses on the more challenging task of artistic text segmentation and constructs a real artistic text segmentation dataset. One challenge of the task is that the local stroke shapes of artistic text are changeable with diversity and complexity. We propose a decoder with the layer-wise momentum query to prevent the model from ignoring stroke regions of special shapes. Another challenge is the complexity of the global topological structure. We further design a skeleton-assisted head to guide the model to focus on the global structure. Additionally, to enhance the generalization performance of the text segmentation model, we propose a strategy for training data synthesis, based on the large multi-modal model and the diffusion model. Experimental results show that our proposed method and synthetic dataset can significantly enhance the performance of artistic text segmentation and achieve state-of-the-art results on other public datasets.
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- Asia > China (0.04)
- (3 more...)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
HomOpt: A Homotopy-Based Hyperparameter Optimization Method
Abraham, Sophia J., Maduranga, Kehelwala D. G., Kinnison, Jeffery, Carmichael, Zachariah, Hauenstein, Jonathan D., Scheirer, Walter J.
Machine learning has achieved remarkable success over the past couple of decades, often attributed to a combination of algorithmic innovations and the availability of high-quality data available at scale. However, a third critical component is the fine-tuning of hyperparameters, which plays a pivotal role in achieving optimal model performance. Despite its significance, hyperparameter optimization (HPO) remains a challenging task for several reasons. Many HPO techniques rely on naive search methods or assume that the loss function is smooth and continuous, which may not always be the case. Traditional methods, like grid search and Bayesian optimization, often struggle to quickly adapt and efficiently search the loss landscape. Grid search is computationally expensive, while Bayesian optimization can be slow to prime. Since the search space for HPO is frequently high-dimensional and non-convex, it is often challenging to efficiently find a global minimum. Moreover, optimal hyperparameters can be sensitive to the specific dataset or task, further complicating the search process. To address these issues, we propose a new hyperparameter optimization method, HomOpt, using a data-driven approach based on a generalized additive model (GAM) surrogate combined with homotopy optimization. This strategy augments established optimization methodologies to boost the performance and effectiveness of any given method with faster convergence to the optimum on continuous, discrete, and categorical domain spaces. We compare the effectiveness of HomOpt applied to multiple optimization techniques (e.g., Random Search, TPE, Bayes, and SMAC) showing improved objective performance on many standardized machine learning benchmarks and challenging open-set recognition tasks.
- North America > United States > Indiana > St. Joseph County > Notre Dame (0.04)
- North America > United States > Tennessee > Putnam County > Cookeville (0.04)
- North America > United States > New Jersey > Bergen County > Hackensack (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)
A Dataset and Method for Hallux Valgus Angle Estimation Based on Deep Learing
Xu, Ningyuan, Zhuang, Jiayan, Wu, Yaojun, Xiao, Jiangjian
Angular measurements is essential to make a resonable treatment for Hallux valgus (HV), a common forefoot deformity. However, it still depends on manual labeling and measurement, which is time-consuming and sometimes unreliable. Automating this process is a thing of concern. However, it lack of dataset and the keypoints based method which made a great success in pose estimation is not suitable for this field.To solve the problems, we made a dataset and developed an algorithm based on deep learning and linear regression. It shows great fitting ability to the ground truth.
Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question
Gao, Haoyuan, Mao, Junhua, Zhou, Jie, Huang, Zhiheng, Wang, Lei, Xu, Wei
In this paper, we present the mQA model, which is able to answer questions about the content of an image. The answer can be a sentence, a phrase or a single word. Our model contains four components: a Long Short-Term Memory (LSTM) to extract the question representation, a Convolutional Neural Network (CNN) to extract the visual representation, an LSTM for storing the linguistic context in an answer, and a fusing component to combine the information from the first three components and generate the answer. We construct a Freestyle Multilingual Image Question Answering (FM-IQA) dataset to train and evaluate our mQA model. It contains over 150,000 images and 310,000 freestyle Chinese question-answer pairs and their English translations.
Could AI Help Reform Academic Publishing?
As someone whose work crosses so many disciplines, I spend a fair bit of my days skimming new developments across not only computer science, but the humanities, social sciences, arts and many other fields, looking for connections and unexpected new approaches that might benefit my own work. The intensely siloed nature of academia is well known, but equally striking is just how rapidly citation standards are falling in a Google Scholar world filled with explosive growth in available knowledge, in which scholars seem genuinely unaware of developments across the rest of their own field, not to mention the rest of academia. Could machine learning approaches dramatically reform the "related work" and citation review component of peer review and academic publishing? Perhaps the most striking element of modern scholarship is that in an era when much of our modern scholarship is available through web and academic database searches, it takes only a few mouse clicks to compile a cross-section of the recent developments in a given space. Yet, peruse the "related work" or "background" section of a typical academic paper and it is amazing just how discipline-specific and artificially circumscribed the set of references are.